WTF: read interrupted SP from ucontext in signalHandlerSuspendResume#235
WTF: read interrupted SP from ucontext in signalHandlerSuspendResume#235robobun wants to merge 1 commit into
Conversation
|
Small suggestion: now that the handler is robust to SA_ONSTACK via ucontext SP, the comment on the SIGPWR override (from Suggested replacement: #if OS(LINUX) && USE(BUN_JSC_ADDITIONS)
// Thread suspension on Linux uses a signal (macOS uses Mach ports instead —
// this entire signal-based mechanism is Linux/FreeBSD-only). We override the
// default SIGUSR1 to SIGPWR because npm packages commonly register
// process.on('SIGUSR1') handlers. SIGPWR ("power failure") is effectively
// unused on modern Linux.
//
// Note: dlopen'd runtimes (Go cgo, Mono, JNI) may add SA_ONSTACK to this
// handler's sigaction flags via their init routines. The handler reads the
// interrupted SP from ucontext rather than its own frame pointer, making it
// robust to alt-stack delivery regardless of flag mutation.
// See oven-sh/bun#31158.
g_wtfConfig.sigThreadSuspendResume = SIGPWR;
#endifThis ties together the "why SIGPWR", "why Linux-only", and "why it's safe" in one place. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📥 CommitsReviewing files that changed from the base of the PR and between a355e3e0cc5b7d4d23b9a8195368bf153592a61a and 3593e2c. 📒 Files selected for processing (1)
WalkthroughThe signal-handler suspend/resume mechanism is improved to detect the interrupted thread's stack pointer from machine-context ChangesSignal handler stack pointer reliability
Possibly related issues
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Comment |
Preview Builds
|
Point WEBKIT_VERSION at the preview autobuild of oven-sh/WebKit#235, which teaches signalHandlerSuspendResume to read the interrupted thread's SP from the ucontext instead of the handler's own currentStackPointer(). That's the root-cause fix — regardless of what any dlopen'd library (Go cgo, Mono, JNI, Rust cdylib, …) does to SA_ONSTACK, the stack-range check always sees the interrupted stack rather than wherever the handler landed. Delete the per-dlopen sigaction scan on the Bun side. With the WTF change in place it's pure overhead, and the reporter rightly pointed out that chaining workarounds for workarounds is the wrong direction. Regression test (test/regression/issue/31158.test.ts) is unchanged and now exercises the upstream fix directly. Swap the preview-pr-235 tag for the merged autobuild hash once oven-sh/WebKit#235 lands.
925c056 to
0eedb19
Compare
0eedb19 to
e478893
Compare
71211c9 to
71e0011
Compare
71e0011 to
66e7173
Compare
…e7173c) Rebase of the signalHandlerSuspendResume ucontext-SP fix onto the current WebKit main pin 09f04cd5, plus a cross-reference comment tying interruptedStackPointer() to JSC::MachineContext::stackPointerImpl so a future arch addition updates both copies.
signalHandlerSuspendResume validated the snapshot by calling currentStackPointer(), the SP of the handler frame itself. That assumes the handler always runs on the thread's normal stack, which fails when SA_ONSTACK is set on our sigaction and the thread has a sigaltstack installed (Go's cgo initsig does this to inherited handlers; combined with ASAN/crash-handler alt stacks the check fails on every retry and Thread::suspend() spins forever). Read the interrupted SP from the ucontext instead, which is stable regardless of which stack the handler runs on. See oven-sh/bun#31158, oven-sh/bun#29843.
66e7173 to
3593e2c
Compare
| // stays correct because it reads the interrupted stack pointer from the ucontext rather | ||
| // than its own frame pointer. See oven-sh/bun#31158. |
There was a problem hiding this comment.
🟡 The comment says the handler reads the interrupted SP from ucontext "rather than its own frame pointer", but the pre-fix code called currentStackPointer(), which reads the stack pointer (SP/RSP), not the frame pointer (FP/RBP). The parallel comment inside signalHandlerSuspendResume correctly says "the handler's own SP"; suggest matching that here: "…rather than the handler's own stack pointer."
Extended reasoning...
What the comment says vs. what the code did
The new comment in initializePlatformThreading() reads:
signalHandlerSuspendResume stays correct because it reads the interrupted stack pointer from the ucontext rather than its own frame pointer.
The contrast being drawn is between the new behaviour (read the interrupted thread's SP from ucontext->uc_mcontext) and the old behaviour. But the old behaviour was a call to currentStackPointer() — defined in wtf/StackPointer.h — which returns the architectural stack pointer (RSP on x86_64, sp on arm64, etc.). It does not read the frame pointer (RBP / FP). So the comment names the wrong register.
Why the distinction matters here
In most prose, "frame pointer" used loosely for "a pointer into the current frame" would be harmless. But this is a file that has just added interruptedStackPointer(), which literally pulls gregs[REG_RSP] / .sp / .arm_sp / mc_rsp out of mcontext_t, and the whole PR is about distinguishing which SP gets checked. In that context, SP vs. FP is not loose terminology — a reader familiar with the difference may pause and wonder whether the old code did FP-based introspection (it didn't), or whether there's some FP-vs-SP subtlety in the alt-stack case (there isn't).
There's also an internal inconsistency: the parallel comment inside signalHandlerSuspendResume itself, added in the same diff, gets it right — "Fall back to the handler's own SP on platforms where we don't have a machine context." The two comments describe the same fallback and should use the same term.
Addressing the refutation
One verifier argued this is below the bar because the intent is clear from context and the wording came from a reviewer's suggested comment. Both points are true — no one will misread the mechanism. But the fix is a one-word change in a comment that is specifically about register-level behaviour, and the same PR already uses the correct term twenty lines up. Aligning the two costs nothing and removes a small "wait, frame pointer?" stumble for future readers. That's squarely nit territory, and it's filed as such.
Step-by-step
- Pre-fix code:
void* approximateStackPointer = currentStackPointer();— reads SP of the handler's current frame. - Post-fix code: reads
interruptedStackPointer(userContext)from ucontext, falling back tocurrentStackPointer(). - The comment at line 245 describes (2) as "reads the interrupted stack pointer from the ucontext rather than its own frame pointer".
- Neither (1) nor the fallback in (2) ever touched the frame pointer; both read the stack pointer.
- Therefore "frame pointer" → "stack pointer" (or "the handler's own stack pointer", matching the in-handler comment).
Impact
None at runtime — comment-only. Documentation accuracy in low-level register-handling code.
Fix
- // stays correct because it reads the interrupted stack pointer from the ucontext rather
- // than its own frame pointer. See oven-sh/bun#31158.
+ // stays correct because it reads the interrupted stack pointer from the ucontext rather
+ // than the handler's own stack pointer. See oven-sh/bun#31158.
Fixes the SIGPWR suspend-loop deadlock reported in oven-sh/bun#31158 (and oven-sh/bun#29843 — same mechanism via Prisma's WASM-backed adapter).
The bug
Thread::suspend()sends the GC suspend-resume signal to the target thread and then spins:The handler decides whether to publish a register snapshot by checking
currentStackPointer()— the SP of the handler's own frame — againstthread->m_stack:That only works so long as the handler runs on the normal stack. If
SA_ONSTACKis set on the sigaction and the thread has asigaltstackinstalled, the handler runs on the alt stack, the check fails on every retry, and the loop spins forever.Who trips this
Go's cgo runtime.
runtime.initsigwalks every signal and, for any handler it didn't install itself, callssetsigstackto force-addSA_ONSTACKso Go's own threads' synchronous faults stay on a managed alt stack. That includes our SIGPWR handler. Any c-shared library that follows the same recipe (Mono, some JNI layouts, Rust cdylibs) hits it too.Combined with sanitizer runtimes / libbacktrace / the host's crash handler installing an alternate signal stack on the main thread (ASAN does this unconditionally), the next GC thread-suspend delivers SIGPWR onto the alt stack → stack check fails every retry → the suspender wedges at 100% CPU.
Reproducible in a plain C program to confirm
currentStackPointervs the ucontext-saved SP:The fix
Read the SP the thread was running at when the signal arrived straight out of the ucontext (kernel-saved register state). That SP is stable regardless of whether the handler itself runs on the normal or the alt stack, and the existing retry-on-alt-stack semantics still work: if the thread genuinely was on an alt stack when the signal arrived (e.g. a nested handler), the ucontext SP reflects that and the check backs off as before.
currentStackPointer()stays as the fallback for non-HAVE(MACHINE_CONTEXT) platforms.Test
Regression test coming from the Bun side (oven-sh/bun#31161) — spawns a child that sets
SA_ONSTACKon the suspend-resume signal the same way Go'sinitsigdoes, then drives the WASM install path that triggersresetInstructionCacheOnAllThreads. Passes with this fix applied; hangs to the test-runner timeout without it.